Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression

نویسندگان

  • Gu Mi
  • Yanming Di
  • Sarah Emerson
  • Jason S. Cumbie
  • Jeff H. Chang
چکیده

When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The necessity of adjusting tests of protein category enrichment in discovery proteomics

MOTIVATION Enrichment tests are used in high-throughput experimentation to measure the association between gene or protein expression and membership in groups or pathways. The Fisher's exact test is commonly used. We specifically examined the associations produced by the Fisher test between protein identification by mass spectrometry discovery proteomics, and their Gene Ontology (GO) term assig...

متن کامل

Bias in microRNA functional enrichment analysis

MOTIVATION Many studies have investigated the differential expression of microRNAs (miRNAs) in disease states and between different treatments, tissues and developmental stages. Given a list of perturbed miRNAs, it is common to predict the shared pathways on which they act. The standard test for functional enrichment typically yields dozens of significantly enriched functional categories, many ...

متن کامل

Supplementary Material for Learning the Dependency Structure of Latent Factors

In Section 4.4, SLFA achieves state-of-the-art result on the classification of microarray data, without using extra biological information. It outperforms Lasso-overlapped-group Jacob et al. (2009), a logistic regression approach with the graph-guided regularization, which utilizes a known biological network. This result indicates that SLFA might automatically explore and discover the deep info...

متن کامل

POEAS: Automated Plant Phenomic Analysis Using Plant Ontology

Biological enrichment analysis using gene ontology (GO) provides a global overview of the functional role of genes or proteins identified from large-scale genomic or proteomic experiments. Phenomic enrichment analysis of gene lists can provide an important layer of information as well as cellular components, molecular functions, and biological processes associated with gene lists. Plant phenomi...

متن کامل

Bias Correction in Group Sequential Analysis with Correlated Data

This paper focuses on the bias of the group sequential estimate of treatment effect for correlated data using the generalized estimating equation (GEE) method and the Lan and DeMets alpha-spending function. Linear and logistic regressions are used to examine (a) the magnitude of the bias of a sequential estimate with correlated data; (b) the influence of the true correlation structure on bias. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012